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deleeuwpdx.net/pubfolders/zeroes has a pdf copy of this article, the complete Rmd hie with 
all code chunks, the bib hie, and the R source code. 

1 Problem 

Multidimensional scaling (MDS) in this paper is minimization of the loss function stress, 
introduced by Kruskal (1964a), Kruskal (1964b). Stress is defined as 

*{X) := EE «%(%- d,,(X)) 2 (1) 


and must, be minimized over all n x p configurations X. Here W = {uy,} and A = {//_,} 
are matrices of, respectively, weights and dissimilarities. Both W and A are symmetric, 
non-negative, and hollow (zero diagonal). The matrix D(X) = {dij(X)} has Euclidean 
distances between the rows of X, defined as 


dJ(X) := (xi - x j )\x i - Xj) = tr X'A^X, 


( 2 ) 


where, using unit vectors and e^, 

-5-ij ■— (ej — Cj)(ej — efi . 


( 3 ) 
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Thus dissimilarities between objects are represented as points in low-dimensional Euclidean 
space. 

Two matrices, introduced in MDS by De Leeuw (1977), play an important role in this article. 
They are both symmetric, doubly-centered, and diagonally dominant. The first matrix V is 


Vij . i 

■ 'y v ij, 


and second matrix, more precisely the matrix valued 

M*) -= |l 

b„(X):=- £ MX) 

l<77^<n 


function, 


dij(X) > 
dij(X) = 


B(X) is 

0, 

0, 


An important result on zero distances at a local minimum was proved by De Leeuw (1984). 
It is based on a formula for the Dini directional derivative (see also De Leeuw, Groenen, and 
Mair (2016)). 

Theorem 1: [Directional Derivative] 

lim a(A )-a(A) = tr Y\V - B(X))X - '£{w ij 5 ij d ij (Y) \ d tJ (X) = 0} 


Proof: Use 


dij(X + eY) 


dij(X) + e 


i 


dij(X) 

dij(Y) 


tr X’AijY 


if d^X) > 0, 
if dij(X) = 0. 


to get the required result. QED 

Theorem 2: [Necessary Condition] If a has a local minimum at X then (V — B(X))X = 0 
and dij(X) > 0 for all i ^ j such that WijSij > 0. 

Proof: Direct from theorem 1. QED 

If a has a local minimum at X and is differentiable at X , then the derivative vanishes. In 
the usual MDS problems we have WijSij > 0 for all i ^ j, and theorem 2 says that stress is 
differentiable at local minima. 


Points where the derivative exists and vanishes are stationary points, which are not necessarily 
local minima. Because of the possibility of zero distances we need a more general notion 
of a stationary point, which we now define as points where 0 G da(X), with da(X) the 
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subdifferential of a at X. We know (De Leeuw (1977)) that a is the difference of a convex 
quadratic r) := tr X'VX and a convex function p(X) : +tr X'B{X)X, and thus the 
subdifferential is simply a translation of the convex subdifferential, or da(X) = 2(VX — 

MX)). 

Theorem 3: [Stationary Points] We have 0 G dcr(X) if and only if for all i < j with 

dij(X) = 0 there exist Uij with vl^u^ < 1 such that 


(r-s(i))i=EEW Mi - e i) u 'ij I d ij( x ) = 0} 


Proof: This follows from the formula for the subdifferential of the distance function 

, f { MnAiX) if dij(X) > 0 , 

dd-AX) = < 13 J l3K ’ ’ 

|{(e ? ; — ej)v! \ u'u < 1 } if dij(X) = 0 . 

QED 

Note that theorem 3 implies that always (V — B(X))X e da(X), whether there are zero 
distances or not. 


2 Zero Distances 

Theorem 2 tells us that at local minima WijSij = 0 if dij(X) = 0. But this leaves open some 
interesting questions about stationary points and local minima that are not covered by this 
result. 

Can there be local minima with zero distances ? If we allow for zero dissimilarities the 
answer is clearly yes. Take A equal to D(X) where the configuration X has at least one zero 
distance. Then X gives the global minimum of stress, equal to zero. And a global minimum 
is, of course, also a local minimum. If we allow for zero weights, then the answer is again yes, 
even if all dissimilarties are positive. Suppose we have four objects with all dissimilarities 1. 
We are fitting two-dimensional configurations. If all weights are one, the global minimum is 
attained for four points in the corners of a square. If we set wu = 0, however, we get perfect 
fit by putting points in the corners of an equilateral triangle, with objects 1 and 2 together 
in one corner. Since this makes stress zero, it is the global minimum. If we set, in addition, 
W 34 = 0 then we get the global minimum by having two points on the line, one with objects 1 
and 2, and one with objects 3 and 4. 

A related question is if there can there be solutions to (V — B(X))X with zero distances ? We 
answer this more systematically. Suppose X satisfies the stationary equations (V — B(X))X = 
0 and X is of block form, with all rows within a block the same, and rows in different blocks 
different. If there are m blocks, then we can write X = GX, where G is an n x m binary 
indicator matrix, indicating block membership. Suppose Ij are the indices of the objects in 
block j. 
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It follows that (V — B(X))GX = 0 and thus (V — B(X))X = 0, where V := G'VG and 
B(X) := G'B{X)X. We can write, somewhat suggestively, B{X) = B(X). Use the fact that 
all between-set distances in Bp(X) are equal to dji(X). Thus if, for j ^ d 


Wji : = 


b je (X) : = 


- E E W i *> 

i£lj k£lj> 


i 

d,t(X) 


EE ^ik^iki 

i£lj k£lg 


( 4 ) 

( 5 ) 


and the diagonal elements are filled in the make the matrices doubly-centered. 

So the stationary equations for an MDS problem with zero distances have a reduced or 
clustered set of stationary equations, of the order of the number of clusters of points, which 
do not involve zero distance. This reduced MDS problem can also be interpreted as the 
original MDS problem, but with the constraints that there is some fixed clustering of points. 

Conversely, if we have stationary equations for an MDS problem with nono-zero weights and 
distances, then we can always expand them to stationary equations for a problem with zero 
distances, simply by finding W and A that satisfy (4) and (5). 


3 Singularities 

Suppose (V — B(X))X = 0 and the n x p matrix X has rank r < p. Define X = KA, 
using the left singular vectors K corresponding to the r non-zero singular values in A. Then 
D(X) = D(X) and consequently (V — B(X))X = 0. Thus we can reduced the problem to a 
non-singular one. 

Conversely, if ( V — B(X))X = 0 we can define X = ( X j 0 )L, with L square orthonormal. 
Again D(X) = D(X) and thus again (V — B(X))X = 0. Thus we can expand the problem 
to a singular one. 
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